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^ ' Abstract 

X) ' 

^ , This paper studies the construction of a refinement kernel for a given operator-valued repro- 

pLn ' ducing kernel such that the vector-valued reproducing kernel Hilbert space of the refinement kernel 

, contains that of the given one as a subspace. The study is motivated from the need of updating the 

current operator-valued reproducing kernel in multi-task learning when underfitting or overfitting 
occurs. Numerical simulations confirm that the established refinement kernel method is able to 
meet this need. Various characterizations are provided based on feature maps and vector-valued 
integral representations of operator-valued reproducing kernels. Concrete examples of refining 
C/2 ' translation invariant and finite Hilbert-Schmidt operator-valued reproducing kernels are provided. 

Other examples include refinement of Hessian of scalar-valued translation-invariant kernels and 
transformation kernels. Existence and properties of operator-valued reproducing kernels preserved 
during the refinement process are also investigated. 

> 

I Keywords: vector-valued reproducing kernel Hilbert spaces, vector-valued reproducing kernels, re- 
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1 Introduction 

Machine learning designs algorithms for the purpose of inferring from finite empirical data a function 
^ ' dependency which can then be used to understand or predict generation of new data. Past research 

, has mainly focused on single task learning problems where the function to be learned is scalar- valued. 

Built upon the theory of scalar- valued reproducing kernels [1], kernel methods have proven useful in 
single task learning, [26^ [27l I28j . The approach might be justified in three ways. Firstly, as inputs 
for learning algorithms are sample data, requiring the sampling process to be stable seems inevitable. 
Thanks to the existence of an inner product, Hilbert spaces are the class of normed vector spaces 
that we can handle best. These two considerations lead immediately to the notion of reproducing 
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kernel Hilbert spaces (RKHS). Secondly, a reasonable learning scheme is expected to make use of 
the similarity between a new input and the existing inputs for prediction. Inner products provide 
a natural measurement of similarities. It is well-known that a bivariate function is a scalar-valued 
reproducing kernel if and only if it is representable as some inner product of the feature of inputs |26j . 
Finally, finding a feature map and taking the inner product of the feature of two inputs are equivalent 
to choosing a scalar-valued reproducing kernel and performing function evaluations of it. This brings 
computational efficiency and gives birth to the important "kernel trick" [26] in machine learning. For 
references on single task learning and scalar- valued RKHS, we recommend [H [TO l [TTt [T5 l [26l [271 [28 l [33] . 

In this paper, we are concerned with multi-task learning where the function to be reconstructed 
from finite sample data takes range in a finite-dimensional Euclidean space, or more generally, a Hilbert 
space. Motivated by the success of kernel methods in single task learning, it was proposed in [141 [20] 
to develop algorithms for multi-task learning in the framework of vector- valued RKHS. We attempt 
to contribute to the theory of vector-valued RKHS by studying a special embedding relationship be- 
tween two vector- valued RKHS. We shall briefly review existing work on vector- valued RKHS and the 
associated operator- valued reproducing kernels. The study of vector- valued RKHS dates back to [23]. 
The notion of matrix- valued or operator- valued reproducing kernels was also obtained in [5]. Refer- 
ences |22[ [23] [32] were devoted to learning a multi-variate function and its gradient simultaneously. 
Reference [^ established the Mercer theorem for vector-valued RKHS and characterized those spaces 
with elements being p-integrable vector-valued functions. Various characterizations and examples of 
universal operator- valued reproducing kernels were provided in [6l[8]. The latter [8] also examined 
basic operations of operator-valued reproducing kernels and extended the Bochner characterization of 
translation invariant reproducing kernels to the operator-valued case. 

The purpose of this paper is to study the refinement relationship of two vector- valued reproducing 
kernels. We say that a vector-valued reproducing kernel is a refinement of another kernel of such type 
if the RKHS of the first kernel contains that of the latter one as a linear subspace and their norms 
coincide on the smaller space. The precise definition will be given in the next section after we provide 
necessary preliminaries on vector- valued RKHS. The study is motivated by the need of updating 
a vector-valued reproducing kernel for multi-task machine learning when underfitting or overfitting 
occurs. Detailed explanations of this motivation will be presented in the next section. Mathematically, 
a thorough understanding of the refinement relationship is essential to the establishment of a multi- 
scale decomposition of vector- valued RKHS, which in turn is the foundation for extending multi-scale 
analysis |12[ [T9] to kernel methods. In fact, a special refinement method by a bijective mapping from 
the input space to itself provides such a decomposition. As the procedure is similar to the scalar- 
valued case, we refer interested authors to |30] for the details. The notion of refinement of scalar- 
valued kernels was initiated and extensively investigated by the first two authors |30[I31]. Therefore, a 
general principle we shall follow is to briefly mention or even completely omit arguments that are not 
essentially different from the scalar- valued case. As we proceed with the study, it will become clear that 
nontrivial obstacles in extending the scalar- valued theory to vector- valued RKHS are mainly caused by 
the complexity in the vector-valued integral representation of the operator-valued reproducing kernels 
under investigation, by the complicated form of the feature map involved, which is also operator- 
valued, and by the inflnite-dimensionality of the output space in some occasions. 

This paper is organized as follows. We shall introduce necessary preliminaries on vector-valued 
RKHS and motivate our study from multi-tasking learning in the next section. In Section 3, we shall 
present three general characterizations of the refinement relationship by examining the difference of 
two given kernels, the feature map representation of kernels, and the associated kernels on the ex- 
tended input space. Recall that most scalar-valued reproducing kernels are represented by integrals. 
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In the operator-valued case, we have two types of integral representations: the integral of operator- 
valued reproducing kernels with respect to a scalar-valued measure, and the integral of scalar-valued 
reproducing kernels with respect to an operator-valued measure. As a key part of this paper, we 
shall investigate in Section 4 specifications of the general characterizations when the operator-valued 
reproducing kernels are given by such integrals. In Section 5, we present concrete examples of refine- 
ment by looking into translation-invariant operator-valued kernels, Hessian of a scalar-valued kernels, 
Hilbert-Schmidt kernels, etc. Section 6 treats specially the existence of nontrivial refinements and de- 
sirable properties of operator-valued reproducing kernels that can be preserved during the refinement 
process. In Section 7, we perform two numerical simulations to show the effect of the refinement kernel 
method in updating operator- valued reproducing kernels for multi-task learning. Finally, we conclude 
the paper in Section 8. 

2 Kernel Refinement 

To explain our motivation from multi-task learning in details, we first recall the definition of operator- 
valued reproducing kernels. Throughout the paper, we let X and A denote a prescribed set and a 
separable Hilbert space, respectively. We shall call X the input space and A the output space. To 
avoid confusion, elements in X and A will be denoted by x, y, and ^, i], respectively. Unless specifically 
mentioned, all the normed vector spaces in the paper are over the field C of complex numbers. Let 
£(A) be the set of all the bounded linear operators from A to A, and £_|_(A) its subset of those linear 
operators A that are positive, namely, 

(A^, Oa > for all ^ e A, 

where (•, •)a is the inner product on A. The adjoint of A £ C{A) is denoted by A*. An C{A)-valued 
reproducing kernel on X is a function K : X xX ^ >C(A) such that K{x, y) = K{y, x)* for all z, y G X, 
and such that for all xj G X, £ A, j £ N„ := {1, 2, . . . , n}, n G N, 

n n 

^Y.^K{x,,Xk)^j,^k)A>0. (2.1) 
j=i k=i 

For each £(A)-valued reproducing kernel K on X, there exists a unique Hilbert space, denoted by 
TIk, consisting of A- valued functions on X such that 

K{x, G nx for all x G X and C G A (2.2) 

and 

(/(x), = (/, K{x, ■)OnK for ah fenK,xeX, and ^ G A. (2.3) 
It is implied by the above two properties that the point evaluation at each x £ X: 

6.{f):= f{x), fe-HK 

is continuous from T-Lk to A. In other words, T-Lk is a A- valued RKHS. We call it the RKHS of K. 
Conversely, for each A- valued RKHS on X, there exists a unique i2(A)-valued reproducing kernel K 
on X that satisfies (12. 2 p and ()2.3p . For this reason, we also call K the reproducing kernel (or kernel for 
short) of T-Ik- The bijective correspondence between £(A)-valued reproducing kernels and A- valued 
RKHS is central to the theory of vector- valued RKHS. 
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Given two £(A)-valued reproducing kernels G on X, we shall investigate in this paper the 
fundamental embedding relationship Hk ^ T-Lg in the sense that ^Hk ^ and for all / G 'Hk, 
WIWhk ~ II/II'Hg- Here, || • ||w; denotes the norm of a normed vector space W. We call G a refinement 
of K if there does hold Hk ^ T^g- Such a refinement is said to be nontrivial if G ^ K. 

We motivate this study from the kernel methods for multi-task learning and from the multi-scale 
decomposition of vector-valued RKHS. Let z := {{xj,^j) : j G N„} C X x A be given sample data. A 
typical kernel method infers from z the minimizer /z of 

1 " 

-E^(^^'^i'/(^j))+^'^(ll/ll«K)> (2-4) 

where K is a selected £(A)-valued reproducing kernel on X, C a prescribed loss function, a a positive 
regularization parameter, and (j) a regularizer. The ideal predictor /o : X — t- A that we are pursuing is 
the one that minimizes 

£{f):= [ G{x,^J{0)dP 
JXxA 

among all possible functions / from X to A. Here P is an unknown probability measure on X x A 
that dominates the generation of data from X x A. We wish that £{fz) — ^ (/o) can converge to zero 
in probability as the number n of sampling points tends to infinity. Whether this will happen depends 
heavily on the choice of the kernel K. The error £{fz) — £{fo) can be decomposed into the sum of the 
approximation error and sampling error, [28] . The approximation error occurs as we search the 
minimizer in a restricted set of candidate functions, namely, %k- It becomes smaller as T-Lk enlarges. 
The sampling error is caused by replacing the expectation £{f) of the loss function C{x,^, /(■?)) with 
the sample mean 

1 " 

-^C{xj,ij,f{xj)). 

By the law of large numbers, the sample mean converges to the expectation in probability as n — )• oo for 
a fixed / € %k- However, as /z varies according to changes in the sample data z, we need a uniform 
version of the law of large number on Hk in order to well control the sampling error. Therefore, 
the sampling error usually increases as Hk enlarges, or to be more precisely, as the capacity of Hk 
increases. 

By the above analysis, we might encounter two situations after the choice of an £(A)-valued 
reproducing kernel K: 

— overfitting, which occurs when the capacity of 1-Lk is too large, forcing the minimizer obtained 
from (j2.4p to imitate artificial function dependency in the sample data, and thus causing the 
sampling error to be out of control; 

— underfitting, which occurs when 1-Lk is too small for the minimizer of (12. 4p to describe the desired 
function dependency implied in the data, and thus failing in bounding the approximation error. 

When one of the above situations happens, a remedy is to modify the reproducing kernel. Specifically, 
one might want to find another £(A)-valued reproducing kernel G such that 1-Lk ^ when there is 
underfitting, or such that 1-Lg ^ T~^K when there is overfitting. We see that in either case, we need to 
make use of the refinement relationship. We shall verify in the last section through extensive numerical 
simulations that the refinement kernel method is indeed able to provide an appropriate update of an 
operator-valued reproducing kernel when underfitting or overfitting occurs. 
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3 General Characterizations 



The relationship between the RKHS of the sum of two operator- valued reproducing kernels and those 
of the summand kernels has been made clear in Theorem 1 on page 44 of [53]. Our first characterization 
of refinement is a direct consequence of this result. 

Proposition 3.1 Let K,G be two C{A)-valued reproducing kernels on X. Then T-Lk ^ T~Lg if o,nd 
only if G — K is an C{A)-valued reproducing kernel on X and Hk H T-Lg-k = {0}. // T-Lk ^ T~iG then 
T-Lg-K is the orthogonal complement ofTiK in Hg- 

Every reproducing kernel has a feature map representation. Specifically, K is an £(A)-valued 
reproducing kernel on X if and only if there exists a Hilbert space W and a mapping <I> : X — )■ C{A, W) 
such that 

K{x,y) = <!>{yr<!>{x), x,y e X, (3.1) 

where £(A, W) denotes the set of bounded linear operators from A to W, and ^{y)* is the adjoint 
operator of <&(y). We call a feature map of K. The following lemma is useful in identifying the 
RKHS of a reproducing kernel given by a feature map representation (13. Ih . 

Lemma 3.2 If K is an C{h)-valued reproducing kernel on X given by 13. 1\) then 

Uk = {H-fu : u G W} (3.2) 

with inner product 

($(-)*n, ^{■Tv)nK := {P^u, P^v)w, u,veW, 
where is the orthogonal projection of W onto 

>V<i. := 'span{^{x)S, : x G X, ^ E A}. 

The second characterization can be proved using Lemma 13.21 and the same arguments with those 
for the scalar- valued case |30| . 

Theorem 3.3 Suppose that C{A) -valued reproducing kernels K and G are given by the feature maps 
$ : X ^ £(A, W) and ^' : X ^ £(A, W'), respectively. Assume that Wq, = W and W'^, = W. Then 
TLk ^ T~iG if and only if there exists a bounded linear operator T from W' to W such that 

T^'{x) = $(x) for all x e X, (3.3) 

and the adjoint operator T* : W — t- W' is isometric. In this case, G is a nontrivial refinement of K if 
and only ifT is not injective. 

To illustrate the above useful results, we shall present a concrete example aiming at refining /2(A)- 
valued reproducing kernels K with a finite-dimensional RKHS. A simple observation is made regarding 
such a kernel. 

Proposition 3.4 A A-valued RKHS T-Lk of finite dimension n G N if and only if there exists annxn 
hermitian and strictly positive-definite matrix A and n linearly independent functions (pj : X — >• A, 
j G N„ such that 

n n 

^(a;,2/)e = J^^jfc(C,'/'j(a;))A</'fc(y), x,yeX,ieA. (3.4) 

i=i fc=i 
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Proof: Assume that T-Lk is n dimensional with orthogonal basis {(pj : j G N^}. As K{x, •)^ G Hk for 
all X £ X, ^ £ A, there exist functions Cj : X x A — )■ C such that 

n 

K{x,y)i = Y,Cj{^,x)^j{y), x,yeX, ^ G A. 
Since {(pj : j G Nn} is a basis for Hk, each function / G Hk has the form 

n 

/ = '^dj(pj, dj G C, j G Nn. 

1 /2 

Clearly, ||/|| := {J2^=i I'^iP) a norm on Hk- It is equivalent to the original one on Tix as 

dim'Hi^ < oo. It is implied that there exists some C > such that 

n 

^\cjiC,x)\^<C\\Kixrml^=C{K{x,x)tOA<CmA\\K{x,x)\\. (3.5) 
i=i 

Obviously, for each x £ X and j G N^, Cj(-,x) is a linear functional on A. This together with (|3.5p 
implies that Cj{-,x) are bounded linear functionals on A. By the Riesz representation theorem, there 
exists ipj : X ^ A, j G such that 

Cj{^,x) = (^,^j(x))A. 

We conclude that K has the form 

n 

K{x,y)i = J^(C,V',(^))A</'i(y), X, y G X, C e A. (3.6) 
i=i 

Since {^j : j G N„} is an orthogonal basis for Hk-, by (|2.3p . 

(e,V'i(^))A = (i^(x,-)?,</'i)wK = (e,0j(x))A, c G A, X G X 

It follows that ifjj = (pj, j G N„. Substituting this into ()3.6p yields that 

n 

K{x,y)i = Y^{i,4>j{x))K(t)j{y), x, y e X, ^ G A, 
i=i 

which indeed is a special form of (|3.4p . 

Conversely, assume that has the form ([3^ . We set Wa ■= -^l(Nn) := {c = (cj : j G N„) G C"} 
with inner product 

n n 

j=l k=l 

Introduce <& : X — > £(A, Wa) by setting <I>(x)^ := ((^, (x))a : J G N„). Direct computations show 
that 

n n 

^*{x)u = ^Y1 <Pjix)ukAjk, u = {uj : j G N„) G Wa- 
j=i k=i 
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Thus, we see that K{x,y) = ^{y)*^{x), x, y £ X, implying that K is an £(A)-vahied reproducing 
kernel. By the linear independence of j E N„, span{$(x)^ : x £ X, ^ £ A} = Wa- We hence apply 
Lemma 13.21 to get that 

-Hk = {H-Tu : u G Wa} = span{(/)j : j G Nn}, 
which is of dimension n. □ 

By the above proposition, we let (j)j, j G Nm be linearly independent functions from X to A, 
where m > n are fixed positive integers. Let A and B he n x n and m x m hermitian and strictly 
positive-definite matrices, respectively. We define K by ()3.4p in terms of matrix A and G by 



G{x,y)^ ■.= ^^Bjk{£,,(t>j{x))A(j)k{y), x, y £ X 

3=1 k=l 

and shall investigate conditions for G to be a refinement of K. 



(3.7) 



Proposition 3.5 Let K , G he defined by !i3.4\ ) and Jg. 7\ ), respectively. Then T-Lk ^ T~Lg if o.'^d only if 
B^^ is an augmentation of A~^ , namely, BJ^^ = A'^^ , j, k £ Nn- In particular, if K , G have the form 

K{x,y)i = ^ aj{^,(l)j{x))A(l)j{y), G{x,y)^ = ^ hk[i,4>k{x))A(t)k{y) 



for some positive constants aj, b^, then T-Lk ^ if o.nd only if aj = bj, j £ Nn- In both cases if 
Tix ^ 'Hg then G is a nontrivial refinement of K if and only if m > n. 

Proof: It suffices to prove the first claim. We observe that K, G have the feature spaces W = /^(N^) 
and W' = /^(Nm), respectively, with feature maps 

:= {{^,Mx))a ■■ J G N„), cD'(x)e := {{^,Mx))a ■ k £ N,„), x G X, ^ G A. 

Suppose that Hk ^ T~iG^ then by Theorem 13.31 there exists a bounded linear operator T : W' — )• W 
with properties as described there. It can be represented by an n x m matrix D as 



iT^'ix)Oj =Y,D^k{iAk[x))A = (e,</>i(^))A, X G X,e G A, 



(3.8) 



fc=i 



which implies that D = [/n,0], where /„ denotes the nxn identity matrix. The adjoint operator T* 
of T is then represented by 

' A 




T*u = B-^ 



u, u £ 



Since T* is isometric, we get that 
which has the form 



v*[A,0]B-^BB-^ 
We derive from the above equation that 

[A,0]B~' 



{T*u,T*v)yv' = {u,v)w, 

u = v*Au, u, V £ C". 



A 




A 




A. 
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Therefore, B ^ is an augmentation of ^ ^. Conversely, if this is true then T : W' — )• W defined by 

Tu' := [/„,0]n', u' G 

satisfies the two properties in Theorem 13.31 As a result, Tix ^ T~(-g- ^ 
It is worthwhile to point out that the above characterization is independent of the Hilbert space 

A. 

Unlike the previous two characterizations, the third one comes as a surprise, telling us that theo- 
retically we are able to reduce our consideration to the scalar-valued case. 

Introduce for each /^(A)-valued reproducing kernel K on X a scalar-valued reproducing kernel K 
on the extended input space X := X x Ahy setting 

Kiix, 0, (y, V)) ■■= iK{x, y)i, 7])a, x,yGX, r? G A. 

By K is indeed positive-definite. 

Proposition 3.6 There holds Tix ^ T~{-g if md only ifTij^ ^ Furthermore, G is a nontrivial 

refinement of K on X if and only if G is a nontrivial refinement of K on X. 

Proof: We first explore the close relationship between T-Lk and T-Lj^. By (j2.3p . 

K{{x, 0, (y, r/)) = {K{x, y% r/)A = {K{x, K{y, ■)r,)n^ , 

which provides a natural feature map ^ : X ^ T-Lx of K 

mx,0) ■.= K{x,-% xeX, UA. 

The density condition W<i) = TIr is clearly satisfied by (|2.3p . We hence obtain by (j3.2p that every 
function / in T-Lk is of the form 

f{x,0 ■= {f{x),OA for some f ^Ur 

with 

ll/llw^ = WfWuK- 

Similar observations can be made about T~Lq- 

It follows immediately that T-Lj^ ^ "Hq if "Hk ^ T~iG- On the other hand, suppose that 71 ^ T-L^. 
Then for each / € %k there exists some g € 1-Lg such that 

(/(^), = />, = 9{x, = Oa for all :e G X, e G A (3.9) 

and 

II/II-Ha' = WfWHf^ = WaWn^ = WaWna- 

Equation (13. 9|) implies that f = g. Therefore, Hk ^ "^G- ^ 

It appears by Proposition 13.61 that we do not have to bother studying refinement of operator- valued 
reproducing kernels. Although the strategy sometimes does simplify the problem, the difficulty is 
generally not reduced significantly. Instead, the result might be viewed as transferring the complexity 
to the input space. Moreover, desirable properties such as translation invariance of the original kernels 
might be lost in the process. As a result, an independent study of the operator-valued case remains 
necessary and challenging. 
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4 Integral Representations 



This section will be built on the theory of vector- valued measures and integrals [21 [13]. Necessary 
preliminaries on the subjects will be explained in sufficient details. 

4.1 Operator- valued kernels with respect to scalar- valued measures. 

Let us first introduce integration of a vector-valued function with respect to a scalar-valued measure. 
Let be a c-algebra of subsets of a fixed set fi, /i a finite nonnegative measure on T ^ and B a Banach 
space. We are concerned with ;S-valued functions on Q. A function / : Q — >^ ;S is said to be simple if 

n 

f = Y.''m (4-1) 

for some finitely many aj G B and pairwise disjoint subsets Ej £ j £ N„. A function / : — )■ ;S is 
called ^-measurable if there exists a sequence of ^B-valued simple functions /„ on such that 

lim \\fn{t) - f{t)\\t3 = for ^ - a.e. t G J], 

n— >oo 

where — a.e. stands for "everywhere except for a set of zero /u measure" . Finally, a valued function 
/ on is called ^-Bochner integrahle if there exists a sequence of simple functions fn'-^^B such 
that 

lim / \\U{t)-f{t)\\tsdf^{t) = 0. (4.2) 
The integral of a simple function / of the form (j4.ip on any E £ T with respect to /i is defined by 



r. n 

/ fd^i■.= y^aj^l{EJr\E). 

Je _i 



In general, suppose that / is a /i-Bochner integrable function from Q to B, that is, (j4.2p holds true. 
Then it is obvious that for each E G fndfi, n G N form a Cauchy sequence in B. Therefore, 



/ fdfi := lim / fndfi. 

Je '^^•^Je 



The resulting integral fdfi is an element in B. 

It is known that a /i-measurable function f : Q ^ B is Bochner integrable if and only if 



/ <+oo. 
Jn 



This provides a way for us to comprehend the integral fdfi in the most needed case when / is 
£(A)-valued. If ;B = C{A) then we have for each E £ that 



fdfi(,v] = {fm,r])Adfi{t), e,r?GA. (4.3) 
E J K JE 

Clearly, the right hand side above defines a sesquilinear form on A x A which is bounded as 



Um,rf)Adfi{t) 



E 



< 



E 
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where || ■ ||£(a) is the operator norm on C{A). As a result, (j4.3p gives an equivalent way of defining 
the integral fdfj, as a bounded linear operator on A, [9]. 

We introduce another notation before returning to reproducing kernels. Denote by L^{^l,13,dfi) 
the Banach space of all the //-measurable functions f : Q. ^ B such that 

\\f\\L^i^n,B,d,) ■■= QjfmlMt?) ' < +00. 

When B = C, L^{n, C, dfj.) will be abbreviated as L^{n, d^x). When B is a Hilbert space, L'^{Q,B,d^) 
is also a Hilbert space with the inner product 

/ {f{t),g{t))tsdfi{t), f,g€L\n,B,dfi). 
Jn 

The discussion in this section by far can be found in |13j . 

Let /i, 1/ be two finite nonnegative measures on a cj-algebra of subsets of 0. To introduce our 
vC(A)-valued reproducing kernels, we also let W be a Hilbert space and (p a mapping from X x il. to 
£(A, W) such that for each x £ X, (j){x,-) belongs to both L2(i7, £(A, W), d/i) and L2(f], £(A, W), dz^). 
We shall investigate conditions that ensure Hk ^ "Hg where 

K{x,y)= [ cl){y,t)*cj){x,t)dfi{t), x,y G X (4.4) 
Jn 

and 

Gix,y)= [ ^{y,t)*(l){x,t)diy{t), x,y £ X, (4.5) 
Jn 

where (j){y,t)* is the adjoint operator of (j){y,t). Note that K,G are well-defined as the integrand is 
Bochner integrable with respect to both n and v. For instance, we observe by the Cauchy-Schwartz 
inequality for all x,y £ X that 

/ U{y,tTcl){x,t)\\c{^^dfi{t) < / \\4>{y,'t)*\\c{w,A)\\4>{x,t)\\ciA,w)dKt) 

Jn Jn 

= / \\4>{y,'t)\\ciA,yv)\\4>{x,t)\\ciA,yv)Mi) 
Jn 

< WHyi •)llL2(n,£(A,W),dM)ll0(a^> ■)\\L2{n,C{A,W),dfJ-)- 
An alternative of expressing K, G is for all x^y G X, ^,r/ G A that 

k{{x,i),{y,ii)) = {K{x,y)i,ii)K= [ ((/.(x, t)^, (/.(y, t)r?)w(i/i(t) 

Jn 

and 

G{ix,^),{y,7])) = {Gix,y)tv)A = [ {(t>{x,t%cl){y,t)r])y^du{t). 

Jn 

When A = W = C, a characterization of T-Lk ^ in terms of /i, v has been established in |31j . 
The relation, between the two measures, which we shall need is absolute continuity. We say that 
/i is absolutely continuous with respect to v if for all E £ J^, v{E) = implies /i(-E) = 0. In this 
case, by the Radon-Nikodym theorem (see, [25], page 121) for scalar- valued measures, there exists a 
nonnegative z/-integrable function, denoted by dn/dv, such that 

H{E)= [ ^{t)di^{t) for all E £ T. 
Je dv 
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We write /_f ^ if /_f is absolutely continuous with respect to v and dii/du G {0, 1} v — a.e. 

When A = W = C, it was proved in Theorem 8 of [31] that if span{(^(x, ■) : x G X} is dense in 
both L^(r2, d/x) and L^(Q, dj^) then G is a refinement of K if and only ii fj, ^ v. fi ^ u then G is a 
nontrivial refinement of K if and only if > 

Theorem 4.1 Let K,G be given by and |^.5[ j. // span {(/>(x, •)^ : x £ X, £ A} is dense in 

both L^(r2, W, d//) and L^(0, W, di^) t/ien "H/^ ^ Tic if and only if fi ^ v. In this case, the refinement 
G of K is nontrivial if and only if v{Q) — > 0. 

Proof: When W = C, as a direct consequence of Theorem 8 in [3T], Tij^ ^ T-Lq if and only \i ^. < v. 
The result hence follows from Proposition 13.61 When W is a general Hilbert space, it can be proved 
by arguments similar to those in [31]. □ 

4.2 Scalar-valued kernels with respect to operator-valued measures. 

Again, ;B is a Banach space and J- denotes a cr-algebra consisting of subsets of a fixed set A 
;B- valued measure on is a function from T to B that is countably additive in the sense that for every 
sequence of pairwise disjoint sets Ej £ j € N 



where the series converges in the norm of B. Every B- valued measure fi on J- comes with a scalar- 
valued measure on defined by 



where the supremum is taken over all partitions V of E into countably many pairwise disjoint members 
of J-'. We call |/i| the variation of and shall only work with these vector- valued measures fx that are 
of bounded variation, that is, |/^|(^^) < +oo. We note that /i vanishes on sets of zero measure. It 
implies that /x is absolutely continuous with respect to in the sense that 



The only type of integration that we shall need is to integrate a bounded J^-measurable scalar- 
valued function with respect to a ;S- valued measure of bounded variation. Denote by L°^{0,, d|/i|) the 
Banach space of essentially bounded J^-measurable functions on 0, with the norm 




\fi\{E) := sup 
r 




lim a{E) = 0. 
|m(e)Ko 



11/11 



L°°(fl,d|M|) := inf {a > : \^l\{{\f \ > a]) = 0} . 



For a simple function / : J7 — )• C of the form 



n 
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where Oj G C and Ej are pairwise disjoint members in J^, we define 



/ fdfi := V ajfi{Ej r\E), E e T. 
Je ^.^1 



Clearly, 



fdfi 



< 



Therefore, the map sending a simple function / to fdfi can be uniquely extended to a bounded 
linear operator from d\fi\) to B. The outcome of the application of the resulting operator on a 

general / G L°°(il, d\fj.\) is still denoted by fdj-L. This is how the i3-valued integral is defined. 
It is time to present the second type of reproducing kernels defined by integration: 



K{x,y):= / ^{x,y,t)dn{t), x,y G X, 
Jn 



(4.6) 



where fj, is an £+(A)-valued measure on T of bounded variation, and is a scalar- valued function such 
that ^(•, •, t) is a scalar- valued reproducing kernel on X for all i G and for all x,y G X, ^{x,y,-) is 
bounded and J^- measurable. We verify that (j4.6p indeed defines an £(A)-valued reproducing kernel. 

Proposition 4.2 With the above assumptions on ^ and fi, the function K defined by is an 

C{h) -valued reproducing kernel on X. 

Proof: Fix finite Xj G X and G A, j G N„. For any e > 0, there exist simple functions 

m 

1=1 

such that 

Xfc, •) - fj,k\\L°-{Q,dM) < ^ ^ I^n- (4.7) 

Here, ctj^k^i G C and Ei are pairwise disjoint sets in with l/iKi?/) > 0, / G Nm- By (j4.7p and the 
definition of integration in this section. 



n n n n , / „ \ \ 

Y^^(K(xj,xk)i^,ik)K - 5^1] ( ( / h.kdA ij.ik) 
j=i k=i j=i k=i ^ ^-'^ ^ ^ ' 



(4.8) 



We may choose by (j4.7p for each / G Nm some ti G Ei such that 

\^!{xj,Xk,ti) - aj^k,i\ < £■ 

Letting 

n n m 

S :=^^^'^{xj,Xk,ti){n{Ei)^j,^k)A, 
j=i k=i 1=1 
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we get by the above equation that 



EE 

j=i k=i 



Combining 



/ fj,kdfi] CjiS.k I - -S" 
Jn / /A 



< 



n n m 



j=l k=l 1=1 



n n m 



^^EEEH^')ikA)ii^.-iiAiiefciiA<^iH(f^)(EiioiiA 

j=i k=i 1=1 S=i 



(4.9) 



and (jO]) yields that 



j=l k=l 



(4.10) 



Since ^'(•, •,*;) is a scalar-valued reproducing kernel on X, [^{xj,Xk,ti) : j,k £ is a positive semi- 
definite matrix for each I £ Nm- So are [(/^(-E'z)Ci) ^fc)A • i> ^ ^ ^n], I S Nm as iJ,{Ei) € £+(A). By the 
Schur product theorem (see, for example, [l7j, page 309), the Hadamard product of two positive semi- 
definite matrices remains positive semi-definite. We obtain by this fact that S" > 0, which together 
with (j4.10p . and the fact that e can be arbitrarily small, proves (j2.ip . □ 



To investigate the refinement relationship, we shall consider a simplified version of (j4.6p that covers 
a large class of operator-valued reproducing kernels. Let : X x — )• C be such that 4>{x^ ■) is a 
bounded J-'-measurable function for every x £ X and such that 



span{(/)(x, •) : X G X} = L?{Q.^ dj) for any finite nonnegative measure 7 on J^. 



(4.11) 



We shall see by the concrete examples in the next section that the denseness requirement (|4.11|) is not 
too restricted in applications. The kernels we shall consider are 



K{x,y) := / (l){x,t)(j){y,t)dfj,{t), x,y G X 
Jn 



and 



G{x,y) :-- 



{x,t)(j){y,t)di^{t), x,y€X, 



(4.12) 



(4.13) 



where /i, v are two £+(A)-valued measures on of bounded variation. By Proposition 14.21 K, G are 
£(A)-valued reproducing kernels on X. Our idea is to use the Radon-Nikodym property of vector- 
valued measures to study the refinement property. 

Let ;B be a Banach space and 7 a finite nonnegative measure on J^. We say that a fi- valued measure 
p on F oi bounded variation has the Radon-Nikodym property with respect to 7 if there is a 7-Bochner 
integrable function L : 17 — )• £+(A) such that for all G 



p{E) 



Td-f. 



Apparently, this could only be true when p is absolutely continuous with respect to 7. For this reason, 
we also say that the space B has the Radon-Nikodym property with respect to 7 if every i3-valued 
measure of bounded variation that is absolutely continuous with respect to 7 has the Radon-Nikodym 
property with respect to 7. Moreover, B is said to have the Radon-Nikodym property if it has it with 
respect to any finite nonnegative measure on any measure space 
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Strikingly different from the scalar-valued case, a Banach space B may not have the Radon- 
Nikodym property. For instance, the Banach space cq of all sequences a := {aj G C : j G N) with 

lim \aj\ = 

under the norm ||a||co := sup{|aj| : j G N} does not have the property with respect to the Lebesgue 
measure (see, [13], page 60). Consequently, the space C{A) does not have the Radon-Nikodym property 
when A is infinite-dimensional. To see this, since A is separable we let {ej : j G N} be an orthonormal 
basis for A. Denote by Co{A) the set of all the operators T G C{A) such that 

Tcj = ctjCj, J G N 

for some a G cq. One sees that ||r||£(yv) = I|ck||co) [!]• As a result, jCo{A) is a closed subspace of C{A) 
that is isometrically isomorphic to cq. Since cq does not have the Radon-Nikodym property, neither 
does Cq{A). A Banach space has the Radon-Nikodym property if and only if each of its closed linear 
subspaces does [H]. By this fact, £(A) does not have Radon-Nikodym property. 

We shall focus on the situation where this desired property holds. For example, reflexive Banach 
spaces have the Radon-Nikodym property [13]. In applications, A is usually finite-dimensional. In 
this case, C{A) is of finite dimension as well. Any two norms on a finite-dimensional Banach space 
are equivalent and a finite-dimensional C{A) can be endowed with a norm that makes it a Hilbert 
space. It yields that C{A) is reflexive. The conclusion is that when A is finite-dimensional, C{A) 
does have the Radon-Nikodym property. Another way of overcoming the difficulty is to confine to a 
subclass of C{A), for example, to the Schatten class [3]. Denote for each compact operator T G C{A) 
by Sj(T), j G N, the nonnegative square root of the j-th largest eigenvalue of T*T. It is called the 
j-th singular number of T. For p G (1, +oo), the p-th Schatten class Sp{A) consists of all the compact 
linear operators T G C{A) with the norm 

\\T\\sAA)--=[T.('^(T)r) <+oo. 

The p-th Schatten class Sp{A) is a reflexive Banach space and hence has the Radon-Nikodym property. 
When p = 2, 5*2 (A) is the class of Hilbert-Schmidt operators and 

/ OO .1/2 

\\T\\s2{A) = i^WTejU] . 

S=i ^ 

We shall not go into further details about the Radon-Nikodym property. Interested readers are referred 
to Chapter III of [T3] and the references therein. 

The assumption we shall need is that there exists a finite nonnegative measure 7 on such that 
both n and u have the Radon-Nikodym property with respect to 7. In other words, there exist 
7-Bochner integrable functions r^,rjy : — t- £4. (A) such that 

fi{E) = [ r^d-f and i^{E) = [ T^d-f for all E e T. (4.14) 
Je Je 

Such two functions exist if 7 := |/i| -|- |z^| and ^,1^ take values in the p-th Schatten class of C{A), 
1 < p < +00. 

Suppose that K,G are given by (|4.12p and (|4.13p . where 0, /i, satisfy (14. lip and (|4.14p . Our 
purpose is to investigate T-Ik ^ T^g- To this end, let us first identify Ti^^ and Hq. We shall only 
present results for T-Lf> as those for T-Lq have a similar form. 
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Lemma 4.3 The RKHS T-Lp^ consists of functions Fj of the form 



Ffix,0-= / (r^(t)/(t),e)A0(x,t)a!7(t), xGX, eeA, 
Jn 

where f can be an arbitrary element from the Hilbert space of ^-measurable functions from to 
A such that 

\w, ■■=( [ (r^it)f{t), fit))Ad^{t)] < +00. 



Moreover, \\Ff\\n^ = for all f G W^,. 

Proof: We observe for all x,y £ X and G A that 



K{{x,0,iy,v))= / H^,t)<P{y,t)(T^{t)tv)Ad7{t). 
Jn 

Thus, we may choose as a feature space for K. The associated feature map <I>^ : X x A — is 
then selected as 

<^f,{x,o{t) ■=Hx,t)^, ten. 

We next verify the denseness condition that span {^^{x,^) : x £ X, £ A} = W^. Suppose that 
/ G is orthogonal to ^^{x, ^) for all x G X and ^ G A, that is, 

/ (r^/(t),e)A0(M)d7(i) = for all X G X, ^ G A. 
Jn 

By glU), 

(r^(t)/(t),OA = 7-a.e. 

As this holds for an arbitrary ^ G A, r^(i)/(t) = 07 — a.e. It implies that ||/||>v^ = 0. The result 
now follows immediately from Lemma 13.21 □ 

For two operators A,B£ £+(A), we write ^ ^ i? if for all ^ G A there exists some rj € A such that 

= Bri and (A^,Oa = {Bv,v)a- 

We make a simple observation about this special relationship between two linear operators. 

Let ker(^) and ran (A) be the kernel and range of A, respectively. If ran (A) is closed then as A 
is self-adjoint, there holds the direct sum decomposition 

A = ker(A) © ran (A). (4.15) 

Thus, A is bijective and bounded from ran (^) to ran (A). By the open mapping theorem, it has a 
bounded inverse on ran (A), which we denote by A~^. 

Proposition 4.4 Suppose that A,B €z C^{A) have closed range. Then A^ B if and only if 

ran (A) C ran (B) (4.16) 

and 

Pb,aB~^ = A-^ on ran {A), (4.17) 

where Pb,a denotes the orthogonal projection from ran {B) to ran {A) . Particularly, if A is onto then 
A ^ B if and only if A = B. 
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Proof: Let A,B have closed range. Suppose first that B. Then (j4.16p clearly holds true. Set for 
each ^ G ran (A) 

7]^ := B^^A^. 

Clearly, the mapping ^ — t- 77^ is linear from ran(j4) to ran(i?). Thus, we have for arbitrary E A 
that 

{A^' + A^, + Oa = (BVi'+i, Vi'+^A = (Bve + Bv^, + V^A, 

which implies that 

Re(^e',e)A= Re{Bri^.,ri^)A. 
A textbook trick yields that for all ^, ^' G ran {A), 

{Ae,OA = {Bri^',vOA = {M',vOA- 

We hence obtain that £, — rj^ € ker(74) for all ^ G ran (A). Consequently, 

A^ - AB-^A^ = A^-Ari^ = for aU ^ G ran (A), 

from which (j4.17p follows. 

On the other hand, suppose that ()4.16p and (j4.17p hold true. Then we choose for each ^ G A 

r? := B-^AC 

and verify that Brj = and 

{Bi],ij)a = {A^B-'AOa = iA^,PB,AB-'AOA = {A^A-'AOa = {A^^Oa- 

Finally, if A is onto then by (j4.16p . ran (A) = ran(i?) = A. According to (j4.15p . both A and B 
are injective. Therefore, they possess a bounded inverse on A. It implies that Pb,a is the identity 
operator on A. By equation (j4.17p . A = B. The proof is complete. □ 

We are ready to present the main result of this section. 



Theorem 4.5 Let K,G be given by 14.12 ) and \4^.13 ), where (pjfJ-,!^ satisfy l^4-ll ) and ^JjJ^- Then 
T-Lk ^ T~iG if and only i/F^ ^ Tj^ 7 — a.e. 

Proof: By Proposition 13.61 and Lemma 14.31 T~Lk ^ T~Lg if and only if for all / G W/^, there exists some 
g G Wy such that 



{T^{t)f{t),OAH^, t)d^it) = / {T,{t)g{t),OAH^, t)djit) for all x G X, ^ G A (4.18) 



and 



(r^(t)/(t),/(t))Ad7(t) = / {r,{t)g{t),git))Adj{t). (4.19) 
n Jn 

By the denseness condition (|4.1ip . (|4.18p holds true if and only if 

{T^{t)f{t),OA = {ry{t)g{t),OA for 7 - a.e. t G and all ^ G A, 
which is equivalent to 

Ti,{t)f{t) = Ty{t)g{t) for 7 - a.e. t G (4.20) 
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We conclude that T-Lk ^ T-ic if and only if for every / G W^, there exists some g £ Wy such that 
equations (|4.19p and (|4.20p hold true. 

Suppose that ^ 7 — a.e. Then clearly, for each / G W^, we can find a function (7 : — )• A 
which is defined 7-almost everywhere and satisfies (j4.20p and 

{T^{t)f{t), f{t))A = {T,{t)g{t),g{t))A for 7 - a.e. t G Q. 

The above equation implies (j4.19p . Therefore, T-Lk ^ T~iG- 

On the other hand, suppose that we can find for every / G some 5/ G Wu satisfying (j4.19p 
and (|4.20p . The function gf can be chosen so that / — )• i^j is linear from to Wu- A trick similar 
to that used in Lemma 14.31 enables us to obtain from (j4.19p and (j4.20p that 

/ {T^it)f'{t), fit) - gf{t))Adjit) = for all /' G W^. 
Jn 

Letting /' := (^(x, •)^ for arbitrary x £ X and ^ G A in the above equation and invoking (j4.1ip . we 
have that 

^t.{i)ifit) - 9f{t)) = for 7 - a.e. t G 
By the above equation and (14.20p . we get for 7-almost every t £ Q that 

{T,it)gj{t),gf{t))A = (r^(t)/(t),<7/(t))A = (/(t), T^Ws/ W)a = (/(t), r^(t)/(t))A = (r^(t)/(t), /(t))A. 
Since ()4.20p and the above equation are true for an arbitrary / G W^, T^ ^ Ti, 7 — a.e. □ 

5 Examples 

We present in this section several concrete examples of refinement of operator-valued reproducing 
kernels. They are built on the general characterizations established in the last two sections. 

5.1 Translation invariant reproducing kernels 

Let d G N and K be an £(A)-valued reproducing kernel on M.'^. We say that K is translation invariant 
if for ah x,y,a £ R'^ 

K{x - a,y - a) = K{x,y). 

A celebrated characterization due to Bochner [1] states that every continuous scalar-valued transla- 
tion invariant reproducing kernel on M'' must be the Fourier transform of a finite nonnegative Borel 
measure on M'^, and vice versa. This result has been generalized to the operator- valued case [2| [8l [IB]. 
Specifically, a continuous function K from M*^ x R'^ to C{A) is a translation invariant reproducing 
kernel if and only if it has the form 

K{x,y)= [ e'^^-y^-'dfi{t), x,y£R^, (5.1) 

for some G i3(M'^, A), the set of all the £+(A)-valued measures of bounded variation on the cr-algebra 
of Borel subsets in M'^. Let G be the kernel given by 

G{x,y)= [ e^(^-3')-*(iz/(t), x,yGR^ (5.2) 
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where v S B{^^S). The purpose of this subsection is to characterize Hk ^ "Wg in terms of [i^v. 
To this end, we first investigate the structure of the RKHS of a translation invariant £(A)-valued 
reproducing kernel. 

Let 7 be an arbitrary measure in ;S(M'^, A) and L the associated translation invariant reproducing 
kernel defined by 

L(x,y)= / e^(^-^)-*d7W, x,yGM'^. (5.3) 



There exists a decomposition of 7 with respect to the Lebesgue measure dx on [13] as follows: 

7 = 7c + 7s, 

where 7c, 7s are the unique measures in B{^^K) such that 7c is absolutely continuous with respect 
to dx, and for each continuous linear functional A on >C(A), the scalar- valued measure A7S and dx are 
mutually singular. It follows from this decomposition of measures a decomposition of L: 

L = Lc -\- Ls, 

where 

L,{x,y)= [ e"^'^-y>'djc{t), Ls{x,y)= [ e'^^'-y^'dj^it), x,yeR''. (5.4) 



Our first observation is that Hl is the orthogonal direct sum of H and % . Two lemmas are needed 
to prove this useful fact. 



Lemma 5.1 Let Lc, Lg be given by |5.^[ ). Then for all & A and x, y G M 

{La{x,y)i,i)K= f e'^'^-y^'d^aAt), a = cors, (5.5) 

where 7^^^ is a scalar-valued Borel measure on defined for each Borel set E (IW^ by 

la,s.{E) ■■= (7a(-E^)C,0A) a = c or s. 

Proof: Let a £ {c, s}, ^ € A, x, y 6 M'^, and s„ be a sequence of simple functions on M'^ that converges 
to e*(^-2/)-* in L°°(R'^,dx). Then 



lim ( ( / s„d7a)C,U = {La{x,y)£,,£,) 



A- 



By definition, we have for each G N that 



lim (( / Snd-Ja] = / Snd-ia,i- 



As 



lim / snd^a,^ = [ e'^'-y>'dja,di)^ 
we conclude from the previous two equations that (|5.5|) holds true. □ 



Lemma 5.2 There holds "H^^ n ^l, = {0}. 
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Proof: We introduce for each ^ G A two scalar- valued translation invariant reproducing kernels on M"^ 
by setting 

Aa{x,y) := {La{x,y)C,OA^ x,?/GM'^, ae{c,s}. 
By Lemma |5. 11 we have the alternative representations for Ac and Ag 

Aa{x,y)= f e'^'^-yy'd^a^iit), x,2/GM^ a = c or s. 

By the Lebesgue decomposition of 7, 7c,^ is absolutely continuous with respect to dx while 7^^^ and 
dx are mutually singular. As a consequence, ^ T'^As = {0} by Lemma 17 in jST] . 
Let a G {c,s}. By (fOD . 

A feature map for Aa may hence be chosen as 

^a{x) ■.= Laix,-)C, xeR'^ 

with the feature space being V-La- We identify by Lemma [32] that 

^A. = {(/(•), Oa:/G^L J. (5.6) 

Assume that n / {0}. Then there exist nontrivial functions f G TIl^ and g € such 
that f = g. As a result, there exists some G A. such that (/(•); C)a is not the trivial function. By 
equation (15. 6p 

{f{-),0A = {~9{-),C)AenA^nnAs, 

contradicting the fact that ^ Has = {0}- □ 

Theorem 5.3 The space Hl is the orthogonal direct sum ofT-Li^ andT-Li^, namely, T-Ll = T~Llc ®T~Lls- 
Proof: The result follows directly from Lemma 15.21 and Proposition 13. li □ 

We are now in a position to study the refinement relationship 1-Lk ^ where K,G are defined 
by ()5.ip and ()5.2p . Firstly, the task can be separated into two related ones according to the Lebesgue 
decomposition of measures /i, v. 

Proposition 5.4 There holds Hk ^ "Wg if o,nd only ifUxc ^ ^'^'^ T~^Ks ^ T~^Gs- 

Proof: By Theorem 15.31 T-Lk = T-Lrc 'Hks and Hg = T-iCc T~^Gs ■ Therefore, if Hkc ^ ^Gc and 
^i^. ^ ^G. then Uk^Ug. 

On the other hand, suppose that Hk ^ ^G- Let / G ^z^^- Then / G and H/H'Hxc ~ ll/llw/f 
Since Hk ^ ^Gj there exists 5 G and /i G "Hg^ such that 

f = g + h 

and 

II/IIwk. = WfWliK = \\9 + Hho = Mile. + Mia. ■ 
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Therefore, to show that T-Lkc — T^Gc it suffices to show that h = 0. Assume that h ^ 0. Note that 
f - g e Hkc+Gc EH, we get that 

nK.+G.nnG.^{o}. (5.7) 

However, 



{K, + Gc){x,y)= [ e'^^-y>'d{^i, + iy,){t), x,y e 



and fic + i^c is absolutely continuous with respect to dx. Thus, equation ()5.7p contradicts Lemma [ 
The contradiction proves that V-Kc ^ ^Gc- Likewise, one can prove that ^ ^Gs- ^ 

By Proposition 15.41 we shall study T-Lrc ^ ^ "Hg^ separately. The kernels to be 

considered are of the following special forms: 

K,ix,y):= [ e*(^-J')-ViWdi, G,{x,y) := [ e'^^-y>'ip2{t)dt, x,y eM." (5.8) 

and 

Ks{x,y) := ^ e^^^-s')-*^^,, G,(x,y) := J] e*(--^) *'=i?fc, x,y € R". (5.9) 

jGJJi fceJ2 

Here, (pi,(p2 are two dx-Bochner integrable functions from M"^ to £+(A), {tj : j G Ji} and {t^ : k E J2} 
are countable sets of pairwise distinct points in M"^, and Aj,Bj are nonzero operators in C^{A) such 
that 

Pjll£(A) < +00, ||-Bfc||£(A) < +00. 

ieJi fcGj2 

The following characterization is a direct consequence of Theorem 14.51 



Proposition 5.5 Let K^Gc he given by 15.^) . Then ^ ^Gc ^/ ^'^^ only if ipi{t) ■< 932 (i) /o?" 
almost every t except for a subset in of zero Lebesgue measure. 



I 



Proof: As ^pi,(p2 are dx-Bochner integrable, 

\'Pj{t)\\ciA)dt < +00, 3 = 1,2. 

Define a finite nonnegative Borel measure 7 on by setting for each Borel subset E in 

-l{E) := /_ \Wi{t)\\c{A) + \W2{t)\\c^K)dt. 
Evidently, Kc, Gc have the form 

K,{x,y)= [ e'^--yy'T,{t)d^{t), G,{x,y) = I e'(^-y>'r2{t)dj{t), 



E 



where for j = 1,2, 

ipj{t) 



T,it) := { 



flit)\\c(A) + y2{t)\\c{A) 
0, otherwise. 



if \\Mt)\\c{A) + \\Mt)\\ciA) > 0, 
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It is also clear that span {e*^ * : X G M''} is dense in L'^ {R'^ , d'y) . By Theorem SSI ^ if and 

only if Fi ^ r2 7 — a.e. Note that ri(t) ^ ^2{t) if and only if (pi{t) ^ <P2{t)- If ^1 ^ 9^2 — a.e. then 

ri ^ r2 7 — a.e. as 7 is absolutely continuous with respect to the Lebesgue measure. On the other 
hand, suppose that Fi ^ r2 7 — a.e. Set 

E:={t€R'': ||v?i(t)||£(A) + \\Mt)\\c{A) > 0}- 

For t G E'^, ^lit) = 9^2(0 = 0, and thus, ifi{t) ^ ^2it)- Assume that there exists a Borel subset 
F C M'^ with a positive Lebesgue measure on which (pi{t) ^2{t)- Then F C E. We reach that 
7(F) > and Fi(t) ;^ F2(t) for t £ F, contradicting the fact that Fi ^ F2 7 - a.e. □ 

For Kg, Gg, we have the following result. 
Proposition 5.6 There holds H-Ks ^ if o.'f^d only if 

(1) {ti:iGJl}^{tfc:fcGj2}; 

(2) for each j G Ji, Aj ^ Bj. Here, re-indexing by condition (1) if necessary, we may assume that 

Jl C J2. 

Proof: Introduce a discrete scalar- valued Borel measure 7 that is supported on {tj : j G Ji} U {tk ■ 
A; G JI2} by setting 

l^fcll£(A) + ll^fckcA), k£hnh, 
i{{tk}) ■■= { \\Bk\\c(A), keh\h, 

|^fc||£(A), /cGJi\J2. 



We also let 



•= -7Th\^ j ^ ^"^d ^^(4) := ,f/,, , k G J2. 



They are discrete £(A)-valued functions supported on {tj : j G Ji} and {t^ : k G JI2}) respectively. We 
reach the following integral representation: 

Kg{x,y)= [ e^(^-^)-*F^(t)d7(t) andG,(x,y)= / e'^'^~y>'TBit)d-fit), x,yGM^. 



By Theorem 14.51 V-Ks ^ ^^Gs if and only if Fa ^ F^ 7 — a.e. It is straightforward to verify that the 
latter is equivalent to conditions (l)-(2). □ 



5.2 Hessian of Scalar- valued reproducing kernels 

Propositions 15.5 1 and 15.61 were established based on Theorem 14.51 In this subsection, we shall consider 
special translation invariant reproducing kernels and establish the characterization of refinement using 
Theorem liTTl 

Let A; be a continuously differentiable translation invariant reproducing kernel on M"^. We consider 
the following matrix-valued functions 

K{x,y) := Vl k{x,y) := 



d^k 



ix,y) -.j^kGNd 



x,y £ 



(5.10) 
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To ensure that K is an £(C'^)-valued reproducing kernels on W^, we make use of the Bochner theorem 
to get some finite nonnegative Borel measure /U on such that 

k{x,y)= [ e'^''~y>^d^l{t), x,yeR'^ (5.11) 

and impose the requirement that 



/ 



tfd^jL{t) < +00. (5.12) 
One sees by the Lebesgue dominated convergence theorem that 

K{x,y)= [ e'^'^-y^-Hfdnit), x,yeR'^, (5.13) 



where we view t S M'^ as a d x 1 vector and t'^ denotes its transpose [ti,t2, ■ ■ ■ ,td]- By the general 
integral representation (|4.4|) of operator-valued reproducing kernels, K defined by (|5.10p is an C{C'^)- 
valued reproducing kernel on M"^. Matrix- valued translation invariant reproducing kernels of the 
form (|5.1U|) are useful for the development of divergence-free kernel methods for solving some special 
partial differential equations (see, for example, \18\ [29] and the references therein). Another class 
of kernels constructed from the Hessian of a scalar-valued translation invariant reproducing kernel is 
widely applied to the learning of a multivariate function together with its gradient simultaneously 
|221 [23| [32] . Such applications make use of kernels of the form 



(5.14) 



17(^,A.-\ H^^y) {^yk{x,y))* 
"^y^^y^- [V.,k{x,y) Vlyk{x,y) 

One sees that under condition (j5.12p 

Kix,y)= [ e'^^~y>'p{t)p{trd^i{t), rE,yGM^ 
jR'i 

where 

We aim at refining matrix-valued reproducing kernels of the forms (|5.1U|) and (j5.14p in this subsection. 
Specifically, we let u be another finite nonnegative Borel measure on satisfying 



/ tt^dv{t) < +00 (5.15) 

jR'i 



and define for x, y G M'^ 

9{x,y) {Vyg{x,y)) 
'^xg{x,y) Vlyg{x,y) 

Our purpose is to characterize Hk ^ "Hg ai^d Tij^ ^ Hq in terms of k,g and p, u. 



g{x,y):= I e'^--y>'du{t), G{x,y) := Vl^g{x,y), G{x,y) 



(5.16) 



Theorem 5.7 Let p, v he finite nonnegative Borel measures on M'^ satisfying h5.12\) and < [5. j5|) . and 
k, g defined by i5.11\} and iS.lb]) . Then K, G, K , G are matrix-valued translation invariant reproducing 
kernels on M'^. The four relationships Hk ^ 'He — ^G' — ^9' ^^"^ A* ^ cii^^ equivalent. 
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Proof: By Theorem 14. II or a result in |3T], Tik ^ 'Hg if and only \i ^ < u. We shall show by Theorem 
14.11 that T-Lk ^ T~iG if ^^nd only if ^ z^. The equivalence of T-L-j^ ^ and ^ ^ v can be proved 
similarly. Set 

Then for each x, t G M"^, 0(x, t) is a linear functional from to C. We observe by ()5.13p that (14. 4p holds 
true. So does (j4.5p . To apply Theorem 14. H it remains to verify that span{(/)(x, : x G M'^, ^ G C^} 
is dense in the Hilbert space L^(R'^, d^), which is straightforward. The claim follows immediately from 
Theorem □ 



5.3 Transformation reproducing kernels 

Let us consider a particular class of matrix- valued reproducing kernels whose universality was studied 
in [6]. The kernels we shall construct are from an input space X to output space A = C", where 
n G N. To this end, we let k,g be two scalar- valued reproducing kernels on another input space Y 
and Tp be mappings from XtoY,p^ f^n- Set 

K{x,y) ■.= [k{TpX,Tqy) -.p^q^nri], G{x,y) := [g{TpX,Tgy) : p,q e Nn], x,y e X. (5.17) 

It is known that K,G defined above are indeed £(C")-valued reproducing kernels [6]. This also 
becomes clear in the proof below. We are interested in the conditions for T-Lk ^ to hold. 

Proposition 5.8 Let K,G be defined by ^5.17\ ). Then T-Lk ^ T~(-G a^^^ only ifH]^ ^ 'Hg, where k,g 
are the restriction of k,g on Up^^Tp(X). In particular, if 

n 

U = ^ (5-18) 

p=i 

then Hk ^ if o-f^d only ifHk ^ T-Lg. 

Proof: It is legitimate to assume that (jS.lSp holds true as otherwise, we may replace Y by Up^]^Tp(X), 
and k,g by k,g, respectively. 

Choose arbitrary feature maps and feature spaces -.Y ^ Wi for k and ^2 '-Y ^ W2 for g such 
that 

-span$j(y) = Wj, j = 1, 2. (5.19) 

By Proposition 13. 6^ T-Lk ^ ^G if and only if ^ Hq. We observe for all x,y G X and G C" 
that 

n n 

K{{x, 0, {y, v)) = {K{x, y)t r?)c" = X] X] ^pV^HTpX, T^y) 

p=l q=l 

n n 
p=l q=l 

(n n \ 

^ep$i(rpx),^775$i(Tgy)) . 
p=l q=l ^ Wi 
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Thus, $1 : X X C" ^ Wi defined by 

n 

p=i 

is a feature map for K. We next verify that span {$1(3;, : x £ X, E C"} is dense in Wi. Assume 
that n G Wi is orthogonal to this hnear span, that is, 

(u, Cp<^i{Tpx)) = for ah X G X, ^ G C". 

Then we have {u,^i{Tpx))y\)-^ = for all x G X and p G N„. It follows from (|5.18p and (j5.19p that 
ti = 0. Similar facts hold for G. 

By Lemma 13.21 H.^ ^ Hq if and only if for every u G Wi, there exists v G VV2 such that 

ftx, ^Cp^>i(Tpx)) = (v,J2^pMTpx)) for all X G X (5.20) 

V p=l /Wi V p^;^ /Wa 

and 

ll^llwi = Il^l|w2- (5.21) 
Recall also that H/^ ^ Hg if and only if for all u G Wi there exists some v G W2 satisfying (|5.2ip and 

(u, $i(2/))wi = $2(y))w2 for ah y G y. (5.22) 

Clearly, (j5.22p implies (|5.20p . Conversely, if (j5.20p holds true then we get that 

(n, <^i(Tpx))H;^ = {v, ^2{Tpx))w^ for all x G X and p G N„, 

which together with (|5.18p imphes (|5.22p . We conclude that Hf^ ^ Hq if and only if Hk ^T-tg- ^ 

A more general case of refinement of transformation reproducing kernels is discussed below. It can 
be proved by arguments similar to those for the previous proposition. 

Proposition 5.9 Let Tp,Sp be mappings from X to Y and k,g be scalar-valued reproducing kernels 
on Y . Define 

K{x,y) := [k{TpX,Tgy) ■.p,qe N„], G{x,y) := [g{SpX,Sqy) : p,g G N„], x,y G X. 

Suppose that for all p G Nn, span {A;(TpX, •) : x G X} and span{g{SpX, ■) : x G X} are dense in Tik 
and Tig, respectively. Then Hk ^ if o-n-d only ifHkp ^ ^-gp for all p G N„, where 

kp{x,y) := k{TpX,Tpy), gp{x,y) := g{SpX, Spy), x,yGX. 
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5.4 Finite Hilbert-Schmidt reproducing kernels 

We consider refinement of finite Hilbert-Schmidt reproducing kernels in this subsection. Let Bj,Cj 
be invertible operators in £+(A), n < m G N, and ^j, j £ Nm, be scalar- valued reproducing kernels 
on the input space X. Define 

n m 

K{x,y):=^Bj^,{x,y), y) = ^ y), x,y e X. (5.23) 

By the general integral representation (j4.6p and Proposition 14.21 K, G above are £(A)-valued repro- 
ducing kernels on X. To ensure that representation (j5.23p can not be further simplified, we shall work 
under the assumption that 

n^^ n n^^ = {o} for an j g n^, (5.24) 

where 

Theorem 5.10 Let K,G be defined by lOgl) . where Bj,Cj E C+{A) are invertible and ^j, j G 
are scalar-valued reproducing kernels on X satisfying ^5.24^ . Then Tix ^ ^^G '^'^^ on/y if Bj = Cj, 
3 G N„. 

Proof: We first find a feature map for K and G. Let (j)j : X ^ Wj be an arbitrary feature map for 
such that span(pj{X) is dense in Wj, and denote by A (8) Wj the tensor product of Hilbert spaces A 
and yVj, j E Nm- The space A (g) Wj is a Hilbert space with the inner product 

i^^u,r]^v)A^W, ■= {^,r])A{u,v)w,, ^,r] e A, u,veWj. 
Set W the orthogonal direct sum of A (X" Wj , j G N„ , whose inner product is defined by 

n 

iiCj ®Uj : j G N„), {r]j ®Vj : j G N„))w := ^(Cj, ??j)A('Uj, ^^j)w, , Cj^^j G A, S Wj, j G N„. 

We claim that $ : X x A W defined by 

:= (v^e®0i(a;) : j G N^, x G AT, ^ G A 
is a feature map for K. Here, denotes the unique operator A in -C+(A) such that = Bj. We 



■"3 

verify for all x, y G X and ^, G A that 

n n 

($(x,C),$(?/,r?))w = Yi^/Wji, ^jri)A{(t)j{x),(t)j{y))w, = J^^l^jC, ??)A^j(a;, y) 

i=i i=i 
= {K{x,y%r,) = K{{x,i),{y,r^)). 

We next show that the denseness condition 

spair{0(x,^) : X G X, ^ G A} = W (5.25) 



25 



is satisfied. To this end, suppose that we have r]j (8) G A (g) Wj, j G N„ such that 

n 

{{r]j (g) Uj : j G nn),4>{x,^))w = '^iVj, \/B'jOA{uj, (/>j(x))>v^. = for all x G X and ^ G A. 

i=i 

Note that (u, 4>ji-))yVj S "Wiii^ for each j G N„. We hence obtain by (j5.24p that 

{rjj, y^Bj^^)A{uj, (f)j{x))-v\;. = for all j G N„, x G X and ^ G A. 

By the denseness of 4>j{X) in Wj, 

{rjj, y^(,)AUj = for ah j G N„ and ^ G A. 

We thus have for all j G either = or 

(v^r?,-,e)A = \/^e)A = for aU ^ G A. 

In the latter case, we have that y^^jrjj = 0. As Bj is invertible, so is \/Bj, following that 7]j = 0. In 
either case, we have that rjj (g Uj = for all j G N^- Equation (|5.25p hence holds true. Similar facts 
hold for G. 

By Proposition 13.61 T~iK ^ T~{-G is equivalent to Hj^ < Hq, which by the above discussion and 
Lemma [312] holds true if and only if for all ij®Uj G AigWj, j G there exist unique 'Hj'i^Vj G A®>Vj, 
j G Nm such that 

n m 

V^OaK', (/>i(a::))>Vj = V^OaI^'j, (/>i(2;))w, for all ^ G A and x G X (5.26) 

i=i j=i 

and 

n m 

Let (g Uj G A (g Wj, j G N„,. If Bj = Cj for j G N„ then we set rjj := and t;^ := Uj for j G N„, 
and 77j = and Vj = for n + 1 < j < m. Clearly, such a choice satisfies equations (I5.26P and (I5.27p . 
Therefore, Hk ^ T^g- Conversely, suppose that T-Ik ^ Tic^ that is, there exist T]j Vj G A (g Wj, 
j G Nm that satisfy equations (I5.26|) and ()5.27p . Note that such r]j (g Vj are unique by the denseness 
condition satisfied by the feature map for G. By (|5.24p . equation (I5.26P implies that 

(0' V^jOA{uj,(l}j{x))wj = iVj, VCj£,)A{vj,4>j{x))w, for ah ^ G A and x € X, j G N„ 

and 

{r]j, \/Cj^)A{vj, (f)j{x))y\!. = for all ^ G A and x E X, n + 1 < j < m. 

By the uniqueness of rjj f j G A (g) Wj, j G Nm, we must have that rjj (g Vj = {s/Cj ^ \/BjCj) ig Uj 
for j G N„, and r]j (g u j = for n + 1 < j < ?n. This together with (j5.27p yields that 

n n 

^{ij:ij)A{uj,Uj)w^ = ^{y/B'jCr^^/B'jCj,^j)A{uj,Uj)w,. 
i=i i=i 
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By successively making ® Uj ^ and .^^ (S" = for A: G N.„ \ {j}, for j G N„, we reach that 
(0>0)a = (\/^C'7'/b;^,-,^,)a for aU G A and j £ R„. 

As -sJ^jCj^ ^J^j is hermitian, it equals the identity operator on A. It follows that Bj = Cj for all 
j G N„. The proof is complete. □ 

As a corollary of Theorem 15.101 we obtain an orthogonal decomposition of Hk- 

Corollary 5.11 Let K he defined by Ii5.23\) . where Bj are invertible and ^j, j G Nn satisfy \5.24^ . 
Then 

n 

and 

A simplest case of (|5.23p occurs when Tiisi- is of dimension 1 for j G N^, which is covered below. 

Corollary 5.12 Let Bj,Ck G /^+(A) be invertible for j G Nn and k G Nm, and ij^k '■ X ^ C, k £ Nm, 
be linearly independent. Set 

n m 

K{x,y) := ^Bjil)j{x)'iljj{y), G{x,y) := ^CkilJk{x)'ilJk{y), x,y e X. 
j=i k=i 

Then T-Lk ^ 'Hg if and only if Bj = Cj for all j G Nn- 

More generally, we might consider K, G defined by two distinct classes of linearly independent 
functions from X to C. The result below can be proved using arguments similar to those for Theorem 

Eini 

Proposition 5.13 Let n < m £ Nn, Bj,Ck G £+(A) be invertible for j G Nn and k G Nm, and 

{'ijjj : j G Nn} and {ipk ■ k G N^} be two classes of linearly independent functions from X to C. Set 

n m 

K{x,y) := ^Bj'il)j{x)i;j{y), G{x,y) := ^Ck<fkix)(pkiy), x,y e X. 
j=i k=i 

Then Hk ^ "Hg if and only if 

(1) ipj G span{v3fc : k G N^} for all j G N„; 

(2) the coefficients Xji £ C in the linear span 

m 

'^j=^>'jm, j^Nn 

1=1 

satisfy 

m 

^ XjiXkiG^^ = 5j^kBJ^ for all j, k G N„. 
1=1 
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We close this section with several concrete examples of finite Hilbert-Schmidt reproducing kernels 
of the form described in Corollary 15.121 and Proposition 15.131 

• polynomial kernels: 

3=1 

where Oj are multi-indices and Bj are invertible operators in £+(A), or 

n 

K{x,y) ■.= ^{x-yf^Bj, x,y eR'^ 

where f3j are nonnegative integers. 

• exponential kernels: 

n 

Kix,y):=^e'^-"yy'^Bj, x,y eM.'' 

where tj G R"^. 



6 Existence 

This section is devoted to the existence of nontrivial refinement of operator-valued reproducing kernels. 
Most of the results to be presented here are straightforward extensions of those in the scalar-valued 
case [3Tj . 

Let X be the input space and A be a Hilbert space. The reproducing kernels under consideration 
are vC(A)-valued. 

Proposition 6.1 There does not exist a nontrivial refinement of an C{A)-valued reproducing kernel 
K on X if and only ifH-K = , the set of all the functions from X to A. If the cardinality of X is 
infinite then every C{A)-valued reproducing kernel on X has a nontrivial refinement. 

Surprisingly, nontrivial results about the existence appear when X is of finite cardinality. 

Proposition 6.2 Let X consist of finitely many points Xj, j G N„ for some n G N„. A necessary 
condition for an C{A)-valued reproducing kernel on X to have no nontrivial refinements is that 

n n n 

Y,Y.^K{xj,Xk)ij,ik)h > for all G A,j G N„ with ^ ||^j||a > 0. (6.1) 
j=i k=i j=i 

A sufficient condition for K to have no nontrivial refinements is that 

n n n 

Y.Y.^K{xj,xu)ij,ik)K > A^llCjIli for allij G A,j G N„ (6.2) 
j=i k=i j=i 

for some constant A > 0. Consequently, if A is finite- dimensional then K does not have a nontrivial 
refinement if and only if h6.1\) holds true. 
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Proof: Suppose that there exist £ A, j £ N.„, at least one of which is nonzero, such that 

n n 

j=i k=i 

This imphes that 

n 

i=i 

We get by (USD that for aU / G 



n / n \ 

E(/(^^)'^^)a= (/'E^>^v)0) =0. 

.7 = 1 ^ .7 = 1 ^ '^K 



J = l 

As a consequence, JIk does not contain the function / : X — ?■ A taking values f{xj) = for j G N„. 
By Proposition 16. H there exist nontrivial refinements for K on X. 

Suppose that (j6.2p holds true for some positive constant A. Assume that Hk is a proper subset of 
A"'^. Then there exists some nonzero vector {^k '■ k G N„) G A" orthogonal to {f{xk) : k G N„) in A"" 
for all / G Hk- Letting / = Yl]^=i ^i^ji yields that 

n n n 

^^{K{xj,Xk)^j,^k)A = '^{f{xk),^k)A = 0, 

j=l k=l k=l 

contradicting (|6.2|) . 

We complete the proof by pointing out that when A is finite-dimensional, ()6.ip and ()6.2p are 
equivalent. □ 



It is worthwhile to note that when A is infinite-dimensional, condition (j6.ip might not be sufficient 
for K to not have a nontrivial refinement. We give a concrete example to illustrate this. 

Let X be a singleton {x}, A := £^(N) consisting of square-summable sequences indexed by N, and 
K{xi,xi) be the operator T on ^^(N) defined by 



Ta := : j e , a G ^^(N). 



Apparently, T G £+(^^(N)) and condition (iO) is satisfied. Let / G TiK- Then there exist a„ G ^^(N), 
71 G N such that K{x, ■)an converges to / in Hk- Being a Cauchy sequence in Hk-, {K{x, ■)an : ?i G N} 
satisfies 

lim \\K{x,-)an- K{x,-)am\\ur^ 



By 

\\K{x,-)an - K{x,-)am\\'iij^ = (i^(x,-)(on - am),K{x,-){an - am))HK 



{K{x,x){an — am), an — am)e'2{n) = {T{an — am), an — am)£2(N) 
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Combining the above two equations yields ^/Tan converges to some b £ i'^iNn)- We now have for each 
c G e'^{n) that 

{f{x),c)i2m) = if, K{x, ■)c)nK = lim {K{x, >„, K{x, ■)c)'Hk 
= Wm. {K{x,x)an,c)pim= lim (Ta„, 0)^2™) 

= (\/r6,c)£2(p^), 

which implies that f{x) = \fTh. Since this is true for an arbitrary function / € 'H.k^ the function 
5 : X — )• A defined by 

g(x) := Q : j G N 

is not in 'Hk- Thus, K has a nontrivial refinement on X. 

In the process of refining an operator-valued reproducing kernel, it is usually desirable to preserve 
favorable properties of the original kernel. We shall show that this is feasible as far as continuity and 
universality of operator-valued reproducing kernels are concerned. Let X be a metric space and K an 
i2(A)-valued reproducing kernel that is continuous from X x X to -C(A) when the latter is equipped 
with the operator norm. Then one sees that JIk consists of continuous functions from X to A. For 
each compact subset 2^ C X, denote by C(-Z, A) the Banach space of all the continuous functions from 
2 to A with the norm 

ll/llc(2,A) :=max||/(x)||A, /gC(^,A). 

Following |21j and [6], we call K a universal kernel on X if for all compact sets Z <^ X and all 
continuous functions / : X — )■ A there exist 

fn G span {K{x, ■)£,■■ x £ Z, £ A}, n G N, 

such that 

lim - /||c(2,A) = 0. 

In other words, K is universal if for all compact subsets Z C X, the closure of span{K{x, : 2; G 
Z, ^ G A} in C{Z,A) equals the whose space C{Z,A). 

For the preservation of continuity, we have the following affirmative result, whose proof is similar 
to the scalar- valued case 1311. 



Proposition 6.3 Let X be a metric space with infinite cardinality. Then every continuous C{A)- 
valued reproducing kernel on X has a nontrivial continuous refinement. 

The following lemma about universality has been proved in [6], and in |21] in the scalar- valued 
case. We provide a simplified proof here. 

Lemma 6.4 Let K be a continuous C{A)-valued reproducing kernel on X with the feature map rep- 
resentation 113. where $ : X — )• £(A, W) is continuous. Then for each compact subset Z C X, 



span{K{x, : x £ Z, ^ g A} = {<^{-)*u : u £ W}, 
where the closures are relative to the norm in C{Z,A). 
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Proof: All the closures to appear in the proof are relative to the norm in C{Z,A). Let Kz be the 
restriction of K on Z. Then the restriction of $ on ^ remains a feature map for Kz- By Lemma 13.21 

nKz = m-Tu-ueW}. (6.3) 

It hence suffices to show that 

"span{K(x, :x e Z, ^ G A} = "span{K2(x, : x e Z, ^ e A} = TL^. 

As span{Kz{x,-)£, : x e Z, ^ G A} C -Hrz^ 

■spaTi{Kz(x,-)C-xe Z, A} cn]^. (6.4) 

On the other hand, for each / G ^-Kz there exist G s];)a,n{Kz{x, : x £ Z, ^ £ A}, n G N 
that converges to / in the norm of Hkz- It follows that /„ converges to / in the norm of C{Z,A). 
Therefore, / G span {Kz{x, : x G Z , ^ G A}, implying that 

Ukz ^ 'span{Kz{x, ■)C:xeZ, ^ G A}. (6.5) 

Combining equations (|6.3p . (j6.4p . and (|6.5p proves the result. □ 

The following positive result about universality can be proved by Lemma [6.4l and arguments similar 
to those used in Proposition 14 of |31j . 

Proposition 6.5 Let X be a metric space and K a continuous C{A)-valued reproducing kernel on X. 
Then every continuous refinement of K on X remains universal. 



7 Numerical Experiments 

We present in this final section two numerical experiments on the application of refinement of operator- 
valued reproducing kernels to multi-task learning. Suppose that /o is a function from the input space 
X to the output space A that we desire to learn from its finite sample data {(xj, ^j) : j G Nm} ^ X x A. 
Here m is the number of sampling points and 

where 6j G A is the noise dominated by some unknown probability measure. To deal with the noise 
and have an acceptable generalization error, we use the following regularization network 

^ m 

+ (7-1) 

J = l 

where K \s & chosen A- valued reproducing kernel on X. Our experiments will be designed so that 
underfitting and overfitting both have the chance to occur. To echo with the motivations in Section 
2, when underfitting happens in the first experiment, we shall find a refinement G oi K aiming at 
improving the performance of the minimizer of (|7.ip in prediction. On the other hand, when overfitting 
appears in the second experiment, we shall then find a A- valued reproducing kernel L on X such that 
T~{-L ^ T~{-K with the same purpose. 
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Before moving on to the experiments, we make a remark on how (jT.ip can be solved. The issue 
has been understood in the work [20]. We say that K is strictly positive- definite if for all finite yj S X, 
J G Np, and for all rjj £ A, j £ Np all of which are not zero 

p p 
j=i k=i 

If K is strictly positive-definite then the minimizer fx of (|7.ip has the form 

fK = Y.Kix„-)v, (7.2) 

i=i 

where ry^'s satisfy 

m 

^ -fsr(xfc, + mcj?7j = ^j, j G Nm- (7.3) 

k=l 

7.1 Experiment one: underfitting 

The vector- valued function to be learned from finite examples is from the input space X = [—1, 1] to 
output space A = M", where n G N. Specifically, it has the form 

foix):=^ak\x-bk\+Cke-'^'''':k£Nn\, 2;G[-1,1], (7.4) 

where a, b, c, d are constant vectors to be randomly generated. The £+(M")-valued reproducing kernel 
that we shall use in the regularization network (j7.ip is a Gaussian kernel 

i^(x,2/) :=5expf-^^^^y x,y£[-l,l], 



where S G £+(M"') is strictly positive-definite. It can be identified by Lemma 13.21 that functions in 
T-Lk are of the form y/Sv, where v is an M^-valued function on [—1, 1] such that for each k G N^, its 
k-th. component Vk is the restriction on [—1, 1] of a square Lebesgue integrable function on M such 
that 



j |nfc(t)|^exp dt <+oo. 



Here Uk denotes the Fourier transform of Uk given as 



Ukit) := / e '''*Ukix)dx, t G 



Therefore, such a function Uk can be extended to an analytic function of finite order on the complex 
plane. In particular, it implies that each component vj. of v is real-analytic on [—1, 1]. As a result, 
components of functions in Uk are real-analytic. The function /q to be approximated is defined by 
(j7.4p . We see that while the exponential component e~'^''^ is real-analytic, the first component \x — hk\ 
is not even continuously differentiable. Underfitting is hence expected. If this is indeed observed then 
a remedy is to use the refinement of K given by 

G{x,y) :=5expf-^^^^^ +T{l + xy)\ x,yG [-1,1], 



32 



where T S C+{W^) is also strictly positive-definite. It can be verified that T-Ik H T-Lg-k = {0}. 
By Proposition 13.11 G is a nontrivial refinement of K. Furthermore, as low order polynomials are 
introduced, the ability for functions in T-Lq to approximate the function \x — bk\ is expected to be 
superior to those in Hk- We perform extensive numerical simulations to confirm these conjectures. 

The dimension n will be chosen from {2, 4, 8, 16}. The number m of sampling points will be set 
to be 30. The sampling points Xj, j E Nm will be randomly sampled from [—1,1] by the uniform 
distribution and the outputs are generated by 

ij = h{xj) + 5j, jGN^, (7.5) 

where 5j are vectors whose components will be randomly generated by the uniform distribution on 
[—(5, 8\ with 5 being the noise level selected from {0.1, 0.3, 0.5}. For each dimension n € {2, 4, 8, 16} and 
noise level 5 G {0.1, 0.3, 0.5}, we run 50 simulations. In each of the simulations, we do the followings: 

1. the components of the coefficient vectors a, 6, c, d in the function /o given by (j7.4p are randomly 
generated by the uniform distribution on [1,3], [—1, 1], [—2,2], and [0,3], respectively; 

2. the sampling points are randomly sampled from [—1,1] by the uniform distribution and the 
outputs are then generated by (|7.5p ; 

3. the matrices S and T are given hy S = A' A and T = B'B where ^4, B are n x n real matrices 
whose components are randomly sampled from [1,3] by the uniform distribution; 

4. we then solve the minimizer fx of (17. ip by (17. 2p and (17. 3p : 

5. for the refinement kernel G, we also obtain fc as the minimizer of 



-Ell/(^.)-?.lli + ^ll/llL' (7-6) 



mm 

6. the regularization parameters in (j7.ip and (j7.6p are optimally chosen so that the relative square 
approximation errors 

^ r.jfKjt) - fojtwdt ^ !\\\fG{t)-fo{t)rdt 

are minimized, respectively. 

We call (Skj^g) obtained in each simulation an instance of approximation errors. Hence, we have 
50 instances for each pair of (n, 6). They are said to form a group. There are 12 groups of instances 
of approximation errors. For each (n, 5), we shall calculate the mean and standard deviation of the 
difference £k — Sg in the corresponding group as a measurement of the difference in the performance 
of learning schemes (j7.ip and (|7.6p . Before that, outliers of instances should be excluded. Although 
we do not know the distributions of £k and Eq^ we shall use the three-sigma rule in statistics. In other 
words, we regard an instance {£k-,£g) as an outlier if the deviation of £k or £g to their respective 
mean in the group exceeds three times their respective standard deviation. There are 32 outliers 
among the entire 600 instances, which are listed below in Table 7.1. 



33 



Table 7.1 Outliers of instances of approximation errors (Skt^g)- instance {£k,£l) is 
considered to be an outlier if the deviation of one of its components to the respective mean in the group 
is more than three times the standard deviation of the group. Outliers are listed in an independent 
table because they should be excluded from the calculation of the mean and standard deviation of the 
approximation errors. Another reason is that adding them will make the plot of the approximation 
errors highly disproportional. 





n = 2 


n = 4 


n = 8 


n = 16 


5 = 0.1 


(0.1024,0.0084) 
(0.0091,0.0081) 
(0.4128,0.0006) 
(0.6783,0.0025) 


(0.0215,0.0182) 
(0.4095,0.0034) 


(0.0230,0.0070) 
(0.0513,0.0091) 
(0.1554,0.0011) 
(0.1464,0.0026) 


(0.0712,0.0015) 
(0.0364,0.0124) 


5 = 0.3 


(0.0286,0.0228) 
(0.4811,0.0020) 


(0.0663,0.0321) 
(0.1892,0.0041) 
(0.1674,0.0095) 


(0.0407,0.0194) 
(0.1809,0.0023) 


(0.1592,0.0018) 
(0.0309,0.0127) 
(0.0229,0.0099) 


6 = 0.5 


(0.2053,0.0020) 
(0.1267,0.0034) 
(0.0669,0.0465) 


(0.0377,0.0376) 
(0.3547,0.0033) 


(0.2445,0.0028) 
(0.2762,0.0020) 
(0.0119,0.0264) 


(0.1612,0.0043) 
(0.0541,0.0081) 



We make a few observations from Table 7.1. Firstly, £g is smaller than £k except for only one 
instance. For a large portion of the outliers, the approximation error 8k is considerably large (larger 
than 10%), a sign of underfitting of the kernel K. Those instances are of the greatest interest to us as 
we desire to see if the refinement kernel G can make a remedy when overfitting does happen. We see 
from Table 7.1 that for all of those outliers, the refinement kernel G always brings down the relative 
approximation error to be less than 1%. The improvement brought by G for other instances is also 
significant. The observations indicate that (17. 6p performs significantly better in learning the function 
()7.4p from finite examples than ()7.ip . For further comparison, we compute the mean and standard 
deviation of the difference £k — £g of the approximation errors after excluding the above outliers. The 
results are tabulated below. Note that a positive value of the mean implies that (j7.6p performs better 
than (j7.ip . It is worthwhile to point out that among all the rest 568 instances excluding the outliers, 
there are only 33 where £g is larger than 8k- The largest value of £g ~ is 0.0020. Therefore, 
we conclude that for all the (n,5), (j7.6p is superior to (j7.ip . and the larger the standard deviation in 
Table 7.2 is, the greater improvement the refinement kernel G brings. 

Table 7.2 The mean and standard deviation (in parentheses) of 8k — £g- The outliers 
of instances listed in Table 7.1 are not counted toward these calculations. If they were added, the 
improvement brought by the refinement kernel G would have been more dramatic. 





n = 2 


n = 4 


n = 8 


n = 16 


5 = 0.1 


0.0098 
(0.0182) 


0.0139 
(0.0335) 


0.0160 
(0.0241) 


0.0108 
(0.0135) 


5 = 0.3 


0.0076 
(0.0144) 


0.0141 
(0.0245) 


0.0143 
(0.0208) 


0.0188 
(0.0259) 


5 = 0.5 


0.0054 
(0.0121) 


0.0127 
(0.0307) 


0.0103 
(0.0186) 


0.0091 
(0.0102) 
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We shall also plot the 12 groups of approximation errors £k,£g for a visual comparison. To this 
end, we take out the instances for which £k is too large to have an appropriate range in the vertical 
axes in the figures. Therefore, Figures 7.1 and 7.2 are not full embodiment of the improvement of 
()7.6p over ()7.ip . Nevertheless, one sees that the improvement brought by the refinement kernel G in 
these relatively well-behaved instances is still dramatic. 



Figure 7.1 Relative approximation errors £k,£g for n = 2,4 and 5 = 0.1,0.3,0.5. The outliers 
listed in Table 7.1 are not plotted here as they would make the figure highly disproportional. 
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Figure 7.2 Relative approximation errors Sk,£g for = 8, 16 and 5 = 0.1, 0.3, 0.5. The outliers 
listed in Table 7.1 are not plotted in the figure here. 
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7.2 Experiment 2: overfitting 

The target function we consider in the second experiment is 



CLk 



l + 25{x-bky 



X G 



-1,1], 



(7.8) 



where the components of the vectors a, b,c,d £ M" wiU be randomly sampled by the uniform distri- 
bution from [1,4], [0, ^], [—2,2], and [0,2] respectively in the numerical simulations. The dimension 
n will be chosen from {2,4,8, 16}. We fix m := 20 and shall sample the inputs Xj, j G Nm randomly 
by the uniform distribution from [—1,1]. Similarly, the outputs G M", j G Nm will be generated by 
(j7.5p where the noise level is to be selected from {0.1,0.3,0.5}. 

In the first step, we substitute the sample data {{xj,(,j) : j G N^} into the regularization network 
(j7.ip with the following kernel 



K{x, y) := S'exp 



{x - yf 



+ T{l + xyf\ x,ye[-lM 



(7.9) 



where S = A' A and T = B'B with A, B being n x n real-matrices whose components will be randomly 
sampled by the uniform distribution from [1,2]. The target function (17. 8p contains translations of the 
Runge function 

^ 7T, X G [-1, 1]. 

1 + 25x2' ^ ' ^ 

It is well-known that approximating the Runge function by high order polynomial interpolations leads 
to overfitting. One sees by ()7.3p that the regulation network ()7.ip might be regarded as a regularized 
interpolation. Note also that the order of the polynomial kernel in (17. 9p is 18, which is close to the 
number m = 20 of sampling points. Overfitting is hence expected. When this occurs, we propose to 
reduce the order of the polynomial kernel by considering 



L{x,y) :-- 



5exp (-^^) + {xy)\ x, y G [-1, 1]. 



By Corollary 15. IH Hl ^ T~Lk, namely, K is a. refinement of L. We shall demonstrate by numerical 
simulations that 



mm 



2 

Hl 



(7.10) 



outperforms ()7.ip with the kernel (17. 9p . To this end, we shall conduct numerical experiments similar to 
those in the last subsection. Let fx and fi be the minimizer of ()7.ip and (|7.10p . respectively. We shall 
measure the performance by the relative square approximation errors £k and which are defined 
in the same way as (j7.7p . For each pair of {n,5), where n G {2,4,8,16} and 6 G {0.1,0.3,0.5}, we 
run 20 numerical simulations where the regularization parameters a are to be chosen so that £k and 
£l are minimized, respectively. As in the first experiment, we shall calculate the mean and standard 
deviation of £k and £l in each group after taking out some outliers. We shall also plot the relative 
errors for comparison. The results are shown below in the form of tables and figures. 
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Table 7.3 Outliers of instances of relative approximation errors {£k,£l). 





5 = 0.1 


5 = 0.3 


5 = 0.5 


n=2 


(0.9000,0.7843) 


(2.9906, 1.3509) 


(1.8065, 0.8044), (1.1332, 0.3213) 
(19.6416,7.6578) 


n=4 


(8.2450,5.8717) 
(1.6654,2.0466) 
(18.9615,12.0513) 
(0.9536, 1.0998) 


(1.1760,0.1354) 
(0.4591,0.7845) 


(4.6316, 7.0497), (2.0850, 1.3204) 
(2.4657,1.1386) 
(5.7967,0.6122) 
(5.1196,2.6692) 


n=8 


(0.9102, 1.3862) 
(1.2233,0.9489) 
(0.6711,0.2249) 


(1.3517,1.8339) 
(0.8450,0.2605) 
(0.3571,0.7221) 
(2.2403,2.0108) 
(5.6153,5.0954) 
(2.0763, 1.3718) 
(2.2567, 1.4024) 


(0.6369, 0.3698), (0.6945, 0.2878) 

(2.2371,2.4008) 

(1.0738,0.4172) 

(1.0561,0.3067) 

(0.6791, 1.0980) 

(3.6689,3.9566) 

(1.1238,0.2467) 


11=16 


(4.4905, 5.8886) 
(7.9187,4.3445) 
(2.1619,0.5061) 
(17.5145,13.7894) 


(26.0758,7.6125) 
(1.2255,0.3181) 
(0.5140,0.1817) 
(2.4289, 1.9022) 


(73.0854, 42.6904), (1.6070, 1.4224) 
(3.2674, 2.2622), (2.1632, 1.7059) 
(2.8067, 0.5791), (9.0120, 3.5443) 
(0.6064, 0.3365), (4.0484, 0.4220) 
(1.0064,0.8287) 



We have more outliers compared to the first experiment. Using fewer sampling points and approx- 
imating the Runge function by polynomials both contributes to this. We observe that for the majority 
of these outliers, E-l is significantly smaller than showing improvement of learning scheme (j7.10p 
over (17. ip . For further comparison, we shall compute the mean and variances of £k — £l and plot the 
relative approximation errors £k and £l for the rest of instances. 

Table 7.4 The mean and standard deviation (in parentheses) of £k — £l- The outliers 
of instances listed in Table 7.3 are not counted toward these calculations. If they were added, the 
improvement brought by the refinement kernel G would have been more dramatic. 





n = 2 


n = A 


n = 8 


n = 16 


5 = 0.1 


0.0289 
(0.0846) 


0.0511 
(0.0587) 


0.0173 
(0.0779) 


0.0157 
(0.0146) 


5 = 0.3 


0.0404 
(0.0922) 


0.0661 
(0.0705) 


0.0671 
(0.0929) 


0.0657 
(0.0918) 


5 = 0.5 


0.0629 
(0.1098) 


0.0130 
(0.0233) 


0.0484 
(0.0758) 


0.0625 
(0.0821) 



A positive value of the mean in Table 7.4 implies that (j7.10p performs better than (|7.ip . It is ob- 
served that kernel L brings improvement for all the choices of n G {2,4,8, 16} and 5 G {0.1,0.3,0.5}. 
We also remark that among all the 188 instances counted in Table 7.4, there are only 32 for which 
£l > £k- The mean and standard deviation of £l — £k for these 32 instances are 0.0264 and 0.0306. 
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We conclude that compared to (jT.ip . (jT.lOp improves the performance considerably in learning the 
function (ITISD . 



Figure 7.3 Relative approximation errors for n = 2,4 and 6 = 0.1,0.3,0.5. The outliers 

listed in Table 7.3 are not plotted here as they will make the figure highly disproportional. 



n = 2,^ = 0,l 


-9- kernel K 
kernel L 


/ 


\ 










2 4 6 8 10 12 14 16 

n = ihU 




2 4 6 8 10 12 14 16 

11 = 4,^ = 0,5 




39 



Figure 7.4 Relative approximation errors £l for n = 8, 16 and 5 = 0.1, 0.3, 0.5. The outliers 
listed in Table 1.3 are not 'plotted here. 
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8 Conclusion 

The refinement relationship between two operator-valued reproducing kernels provides a promising 
way of updating kernels for multi-task machine learning when overfitting or underfitting occurs. We 
establish several general characterizations of the refinement relationship. Particular attention has been 
paid to the case when the kernels under investigation have a vector-valued integral representation, 
the most general form of operator-valued reproducing kernels. By the characterizations, we present 
concrete examples of refining the translation invariant operator- valued reproducing kernels, Hessian 
of the scalar-valued Gaussian kernel, and finite Hilbert-Schmidt operator-valued reproducing kernels. 
Two numerical experiments confirm the potential usefulness of the proposed refinement method in 
updating kernels for multi-task learning. We plan to investigate the effect of the method by real 
application data in another occasion. 
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